Unit 01 notes

Events and outcomes

01 Theory

Events and outcomes – informally
  • An event is a description of something that can happen.
  • An outcome is a complete description of something that can happen.

All outcomes are events. An event is usually a partial description. Outcomes are events given with a complete description.

Here ‘complete’ and ‘partial’ are within the context of the probability model.

  • It can be misleading to say that an ‘outcome’ is an ‘observation’.
    • ‘Observations’ occur in the real world, while ‘outcomes’ occur in the model.
    • To the extent the model is a good one, and the observation conveys complete information, we can say ‘outcome’ for the observation.

Notice:

  • Because outcomes are complete, no two distinct outcomes could actually happen in a run of the experiment being modeled.

When an event happens, the fact that it has happened constitutes information.

Events and outcomes – mathematically
  • The sample space is the set of possible outcomes, so it is the set of the complete descriptions of everything that can happen.
  • An event is a subset of the sample space, so it is a collection of outcomes.
  • For mathematicians: some “wild” subsets are not valid events. Problems with infinity and the continuum...
Notation
  • Write for the set of possible outcomes, for a single outcome in .

  • Write or for some events, subsets of .

  • Write for the collection of all events. This is frequently a huge set!

  • Write for the cardinality or size of a set , i.e. the number of elements it contains.

Using this notation, we can consider an outcome itself as an event by considering the “singleton” subset which contains that outcome alone.

02 Illustration

Example - Coin flipping

01 - Coin flipping

Flip a fair coin two times and record both results.

  • Outcomes: sequences, like or .

  • Sample space: all possible sequences, i.e. the set .

  • Events: for example:

With this setup, we may combine events in various ways to generate other events:

  • Complex events: for example:

    • , or in words: Notice that the last one is a complete description, namely the outcome .

    • , or in words:

Exercise - Coin flipping: counting subsets

02 - Coin flipping: counting subsets

Flip a fair coin five times and record the results.

How many elements are in the sample space? (How big is ?)
How many events are there? (How big is ?)

03 Theory

New events from old

Given two events and , we can form new events using set operations:
We also use these terms for events and :

  • They are mutually exclusive when , that is, they have no elements in common.

  • They are collectively exhaustive , that is, when they jointly cover all possible outcomes.

  • ! In probability texts, sometimes is written “” or even (frequently!) “”.
Rules for sets

Algebraic rules

  • Associativity: . Analogous to .

  • Distributivity: . Analogous to .

De Morgan’s Laws


  • In other words: you can distribute “ ” but must simultaneously do a switch .

Probability models

04 Theory

Axioms of probability

A probability measure is a function satisfying:

Kolmogorov Axioms:

  • Axiom 1: for every event
    (probabilities are not negative!)

  • Axiom 2:
    (probability of “anything” happening is 1)

  • Axiom 3: additivity for any countable collection of mutually exclusive events:

  • %& Notation: we write instead of , even though is a function, to emphasize the fact that is a set.
Probability model

A probability model or probability space consists of a triple :

  • the sample space

  • the set of valid events, where every satisfies

  • a probability measure satisfying the Kolmogorov Axioms

Finitely many exclusive events

It is a consequence of the Kolmogorov Axioms that additivity also works for finite collections of mutually exclusive events:

Inferences from Kolmogorov

A probability measure satisfies these rules.
They can be deduced from the Kolmogorov Axioms.

  • Negation: Can you find but not ? Use negation:

  • Monotonicity: Probabilities grow when outcomes are added:

  • Inclusion-Exclusion: A trick for resolving unions: (even when and are not exclusive!)

Inclusion-Exclusion

The principle of inclusion-exclusion generalizes to three events:

The same pattern works for any number of events!

The pattern goes: “include singles” then “exclude doubles” then “include triples” then ...

Include, exclude, include, exclude, include, ...

05 Illustration

Example - Lucia is Host or Player

03 - Lucia is Host or Player

The professor chooses three students at random for a game in a class of 40, one to be Host, one to be Player, one to be Judge. What is the probability that Lucia is either Host or Player?

Solution
  1. &&& Set up the probability model.

    • Label the students to . Write for Lucia’s number.

    • Outcomes: assignments such as
      These are ordered triples with distinct entries in .

    • Sample space: is the collection of all such distinct triples

    • Events: any subset of

    • Probability measure: assume all outcomes are equally likely, so for all

    • In total there are triples of distinct numbers.

    • Therefore for any specific outcome .

    • Therefore for any event . (Recall is the number of outcomes in .)

  2. && Define the desired event.

    • Want to find

    • Define and . Thus:

    • So we seek .

  3. &&& Compute the desired probability.

    • Importantly, (mutually exclusive).
      There are no outcomes in in which Lucia is both Host and Player.

    • By additivity, we infer .

    • Now compute .

      • There are ways to choose and from the students besides Lucia.

      • Therefore .

      • Therefore:

    • Now compute . It is similar: .

    • Finally compute that , so the answer is:

Example - iPhones and iPads

04 - iPhones and iPads

At Mr. Jefferson’s University, 25% of students have an iPhone, 30% have an iPad, and 60% have neither.

What is the probability that a randomly chosen student has some iProduct? (Q1)

What about both? (Q2)

Solution
  1. &&& Set up the probability model.

    • A student is chosen at random: an outcome is the chosen student.

    • Sample space is the set of all students.

    • Write and concerning the chosen student.

    • All students are equally likely to be chosen: therefore for any event .

    • Therefore and .

    • Furthermore, . This means 60% have “not iPhone AND not iPad”.

  2. & Define the desired event.

    • Q1:

    • Q2:

  3. &&& Compute the probabilities.

    • We do not believe and are exclusive.

    • Try: apply inclusion-exclusion:

    • We know and . So this formula, with given data, RELATES Q1 and Q2.

    • Notice the complements in and try Negation.

    • Negation: DOESN’T HELP.

    • Try again: Negation:

    • And De Morgan (or a Venn diagram!):

    • Therefore:

    • We have found Q1: .

    • Applying the RELATION from inclusion-exclusion, we get Q2:

Conditional probability

06 Theory

Conditional probability

The conditional probability of “ given ” is defined by:

This conditional probability represents the probability of event taking place given the assumption that took place. (All within the given probability model.)

By letting the actuality of event be taken as a fixed hypothesis, we can define a conditional probability measure by plugging events into the slot of :

It is possible to verify each of the Kolmogorov axioms for this function, and therefore itself defines a bona fide probability measure.

Conditioning

What does it really mean?

Conceptually, corresponds to creating a new experiment in which we run the old experiment and record data only those times that happened. Or, it corresponds to finding ourselves with knowledge or data that happened, and we seek our best estimates of the likelihoods of other events, based on our existing model and the actuality of .

Mathematically, corresponds to restricting the probability function to outcomes in , and renormalizing the values (dividing by ) so that the total probability of all the outcomes (in ) is now .

The definition of conditional probability can also be turned around and reinterpreted:

Multiplication rule

“The probability of AND equals the probability of times the probability of -given-.”

This principle generalizes to any events in sequence:

Generalized multiplication rule

The generalized rule can be verified like this. First substitute for and for in the original rule. Now repeat, substituting for and for in the original rule, and combine with the first one, and you find the rule for triples. Repeat again with and , combine with the triples, and you get quadruples.

07 Illustration

Exercise - Simplifying conditionals

05 - Simplifying conditionals inclusion

Let . Simplify the following values:

Example - Coin flipping: at least 2 heads

06 - Coin flipping: at least 2 heads

Flip a fair coin 4 times and record the outcomes as sequences, like .

Let be the event that there are at least two heads, and the event that there is at least one heads.

First let’s calculate .

Define , the event that there were exactly 2 heads, and , the event of exactly 3, and the event of exactly 4. These events are exclusive, so: Each term on the right can be calculated by counting:

Therefore, .

Now suppose we find out that “at least one heads definitely came up”. (Meaning that we know .) For example, our friend is running the experiment and tells us this fact about the outcome.

Now what is our estimate of likelihood of ?

The formula for conditioning gives: Now . (Any outcome with at least two heads automatically has at least one heads.) We already found that . To compute we simply add the probability , which is , to get .

Therefore:

Example: Flip a coin, then roll dice

07 - Multiplication: flip a coin, then roll dice

Flip a coin. If the outcome is heads, roll two dice and add the numbers. If the outcome is tails, roll a single die and take that number. What is the probability of getting a tails AND a number at least 3?

Solution

This “two-stage” experiment lends itself to a solution using the multiplication rule for conditional probability.

  1. & Label the events of interest.
    • Let and be the events that the coin showed heads and tails, respectively.
    • Let be the events that the final number is , respectively.
    • The value we seek is .
  2. & Observe known (conditional) probabilities.
    • We know that and .
    • We know that , for example, or that .
  3. && Apply “multiplication” rule.
    • This rule gives:
    • We know and can see by counting that .
    • Therefore .

Multiplication: draw two cards

08 - Multiplication: draw two cards

Two cards are drawn from a standard deck (without replacement).

What is the probability that the first is a 3, and the second is a 4?

Solution

This “two-stage” experiment lends itself to a solution using the multiplication rule for conditional probability.

  1. & Label events.
    • Write for the event that the first card is a 3
    • Write for the event that the second card is a 4.
    • We seek .
  2. & Write down knowns.
    • We know . (It does not depend on the second draw.)
    • Easily find .
      • If the first is a 3, then there are four 4s remaining and 51 cards.
      • So .
  3. & Apply multiplication rule.
    • Multiplication rule:
    • Therefore

08 Theory

Division into Cases

For any events and :

Interpretation: event may be divided along the lines of , with some of coming from the part in and the rest from the part in .

Total Probability - Explanation
  • First divide itself into parts in and out of :
  • These parts are exclusive, so in probability we have:
  • Use the Multiplication rule to break up and :
  • Now substitute in the prior formula:

This law can be generalized to any partition of the sample space . A partition is a collection of events which are mutually exclusive and jointly exhaustive: The generalized formulation of Total Probability for a partition is:

Law of Total Probability

For a partition of the sample space :

center

Division into Cases is just the Law of Total Probability after setting and .

09 Illustration

Exercise - Marble transferred, marble drawn

09 - Marble transferred, marble drawn

Setup:

  • Bin 1 holds five red and four green marbles.
  • Bin 2 holds four red and five green marbles.

Experiment:

  • You take a random marble from Bin 1 and put it in Bin 2 and shake Bin 2.
  • Then you draw a random marble from Bin 2 and look at it.

What is the probability that the marble you look at is red?

Bayes’ Theorem

10 Theory

Bayes’ Theorem

For any events and :

  • Bayes’ Theorem is also called Bayes’ Rule sometimes.
Bayes’ Theorem - Derivation

Start with the observation that , or event “ AND ” equals event “ AND ”.

Apply the multiplication rule to each of order:

Equate them and rearrange:

The main application of Bayes’ Theorem is to calculate when it is easy to calculate from the problem setup. Often this occurs in multi-stage experiments where event describes outcomes of an intermediate stage.

Note: these notes use alphabetical order , as a mnemonic for temporal or logical order, i.e. that comes first in time, or that otherwise that is the prior conditional from which it is easier to calculate .

11 Illustration

Example - Bayes’ Theorem - COVID tests

10 - Bayes’ Theorem: COVID tests

Assume that 0.5% of people have COVID. Suppose a COVID test gives a (true) positive on 96% of patients who have COVID, but gives a (false) positive on 2% of patients who do not have COVID. Bob tests positive. What is the probability that Bob has COVID?

Solution
  1. & Label events.
    • Event : Bob is actually positive for COVID
    • Event : Bob is actually negative; note
    • Event : Bob tests positive
    • Event : Bob tests negative; note
  2. && Identify knowns.
    • Know:
    • Know:
    • Know: and therefore
    • We seek:
  3. ! Translate Bayes’ Theorem.
    • Using and in the formula:
    • We know all values on the right except
  4. !! Use Division into Cases.
    • Observe:
    • Division into Cases yields:
    • !!! Important to notice this technique!
      • It is a common element of Bayes’ Theorem application problems.
      • It is frequently needed for the denominator.
    • Plug in data and compute:
  5. & Compute answer.
    • Plug in and compute:

Intuition - COVID testing

Some people find the low number surprising. In order to repair your intuition, think about it like this: roughly 2.5% of tests are positive, with roughly 2% coming from false positives, and roughly 0.5% from true positives. The true ones make up only of the positive results!

(This rough approximation is by assuming .)

If two tests both come back positive, the odds of COVID are now 98%.

If only people with symptoms are tested, so that, say, 20% of those tested have COVID, that is, , then one positive test implies a COVID probability of 92%.

Exercise - Bayes’ Theorem and Multiplication: Inferring bin from marble

11 - Inferring bin from marble

There are marbles in bins in a room:

  • Bin 1 holds 7 red and 5 green marbles.
  • Bin 2 holds 4 red and 3 green marbles.

Your friend goes in the room, shuts the door, and selects a random bin, then draws a random marble. (Equal odds for each bin, then equal odds for each marble in that bin.) He comes out and shows you a red marble.

What is the probability that this red marble was taken from Bin 1?

Independence

12 Theory

Two events are independent when information about one of them does not change our probability estimate for the other. Mathematically, there are three ways to express this fact:

Independence

Events and are independent when these (logically equivalent) equations hold:

  • ! The last equation is symmetric in and .
    • Check: and
    • This symmetric version is the preferred definition of the concept.
Multiple-independence

A collection of events is mutually independent when every subcollection satisfies:

A potentially weaker condition for a collection is called pairwise independence, which holds when all 2-member subcollections are independent:

One could also define -member independence, or -member independence. Plain ‘independence’ means any-member independence.

13 Illustration

Exercise - Independence and complements

12 - Independence and complements

Prove that these are logically equivalent statements:

  • and are independent
  • and are independent
  • and are independent

Make sure you demonstrate both directions of each equivalency.

Example - Checking independence by hand

13 - Independence by hand: red and green marbles

A bin contains 4 red and 7 green marbles. Two marbles are drawn.

Let be the event that the first marble is red, and let be the event that the second marble is green.

  • (a) Show that and are independent if the marbles are drawn with replacement.
  • (b) Show that and are not independent if the marbles are drawn without replacement.
Solution

(a) With replacement.

  1. & Identify knowns.
    • Know:
    • Know:
  2. & Compute both sides of independence relation.
    • Relation is
    • Right side is
    • For , have ways to get , and total outcomes.
    • So left side is , which equals the right side.

(b) Without replacement.

  1. & Identify knowns.
    • Know: and therefore
    • We seek: and
  2. && Find using Division into Cases.
    • Division into cases:
    • Therefore:
    • Find these by counting and compute:
  3. & Find using Multiplication rule.
    • Multiplication rule (implicitly used above already):
  4. & Compare both sides.
    • Left side:
    • Whereas, right side:
    • But so and they are not independent.

Tree diagrams

14 Theory

A tree diagram depicts the components of a multi-stage experiment. Nodes, or branch points, represent sources of randomness.

center

An outcome of the experiment is represented by a pathway taken from the root (left-most node) to a leaf (right-most node). The branch chosen at a given node junction represents the outcome of the “sub-experiment” constituting that branch point. So a pathway encodes the outcomes of all sub-experiments.

Each branch from a node is labeled with a probability number. This is the probability that the sub-experiment of that node has the outcome of that branch.

  • The probability label on some branch is the conditional probability of that branch, assuming the pathway from root to prior node.
    • In the example: .
    • Therefore, branch labels from given node sum to 1. (Law of Total Probability)
  • The probability of a given (overall) outcome is the product of the probabilities on each branch of the pathway to that outcome.
    • Makes sense, because (e.g.):
    • More generally: remember that (e.g.):
    • This overall outcome probability may be written at the leaf.

One can also use a tree diagram to remember quickly how to calculate certain probabilities.

For example, what is in the diagram?
Answer: add up the pathway probabilities (leaf numbers) terminating in . That makes

For example, what is ?
Answer: divide the leaf probability of by the total probability of . That makes:

15 Illustration

Example - Tree diagrams: Marble transferred, marble drawn

14 - Marble transferred, marble drawn

Setup:

  • Bin 1 holds five red and four green marbles.
  • Bin 2 holds four red and five green marbles.

Experiment:

  • You take a random marble from Bin 1 and put it in Bin 2 and shake Bin 2.
  • Then you draw a random marble from Bin 2 and look at it.

Questions:

  • (a) What is the probability you draw a red marble?
  • (b) Supposing that you drew a red marble, what is the probability that a red marble was transferred?
Solution
  1. &&& Construct the tree diagram.
    • Identify sub-experiments, label events, compute probabilities:
      Pasted image 20250121175439.png
  2. & For (a), compute .
    • Add up leaf numbers for at leaf:
  3. & For (b), compute .
    • Conditional probability:
    • Plug in data and compute:
    • Interpretation: mass of desired pathway over mass of possible pathways.

Counting

16 Theory

In many “games of chance”, it is assumed by symmetry principles that all outcomes are equally likely. From this assumption we infer the rule for : In words: the probability of event is the number of outcomes in divided by the number of possible outcomes.

When this formula applies, it is important to be able to count total outcomes, as well as outcomes satisfying various conditions.

Permutations

Permutations count the number of ordered lists one can form from some items. For a list of items taken from a total collection of , the number of permutations is:

To see where this comes from:
There are choices for the first item, then for the second, then ... then for the item. So the number is . Observe:

Combinations, binomial coefficient

Combinations count the number of sets (ignoring order) one can form from some items. We define a notation for it like this: This counts the number of sets of distinct elements taken from a total collection of items.

Another name for combinations is the binomial coefficient.

This formula can be derived from the formula for permutations. The possible permutations can be partitioned into combinations: each combination gives a set, and by specifying an ordering of elements in the set, we get a permutation. For a set of elements taken from items, there are ways to put them into a specific order. So the number of permutations must be a factor of greater than the number of combinations.

This notation, , is also called the binomial coefficient because it provides the coefficients of a binomial expansion:
For example:

There are also ‘higher’ combinations:

Multinomial coefficient

The general multinomial coefficient is defined by the formula:

where .

The multinomial coefficient measures the number of ways to partition items into sets with sizes , respectively.

Notice that so we already defined these values with binomial coefficients. But with , we have new values. They correspond to the coefficients in multinomial expansions. For example gives coefficients for .

17 Illustration

Exercise - Combinations: Counting teams with Cooper

15 - Counting teams with Cooper

A team of 3 student volunteers is formed at random from a class of 40. What is the probability that Cooper is on the team?

Example - Combinations: Groups with Haley and Hugo

16 - Haley and Hugo from 2 groups of 3

The class has 40 students. Suppose the professor chooses 3 students Wednesday at random, and again 3 on Friday. What is the probability that Haley is chosen today and Hugo on Friday?

Solution
  1. & Count total outcomes.
    • Have possible groups chosen Wednesday.
    • Have possible groups chosen Friday.
    • Therefore possible groups in total.
  2. && Count desired outcomes.
    • Groups of 3 with Haley are same as groups of 2 taken from others.
    • Therefore have groups that contain Haley.
    • Have groups that contain Hugo.
    • Therefore total desired outcomes.
  3. && Compute probability.
    • Let label the desired event.
    • Use formula:
    • Therefore:

Example - Counting VA license plates

17 - Counting VA license plates

A VA license plate has three letters (with no I, O, or Q) followed by four numerals. A random plate is seen on the road.

  • (a) What is the probability that the numerals are in increasing order?
  • (b) What is the probability that at least one number is repeated?
Solution

(a)

  1. & Count ways to have 4 numerals in increasing order.
    • Any four distinct numerals have a single order that’s increasing.
    • There are ways to choose 4 numerals from 10 options.
  2. & Count ways to have 3 letters in order except I, O, Q.
    • 26 total letters, 3 excluded, thus 23 options.
    • Repetition allowed, thus possibilities.
  3. & Count total plates with increasing numerals.
    • Multiply the options:
  4. & Count total plates.
    • Have options for letters.
    • Have options for numbers.
    • Thus possible plates.
  5. & Compute probability.
    • Let label the event that a plate has increasing numerals.
    • Use the formula:
    • Therefore:

(b)

  1. && Count plates with at least one number repeated.
    • ! “At least” is hard! Try complement: “no repeats”.
    • Let be event that no numbers are repeated. All distinct.
    • Count possibilities:
    • Total license plates is still .
    • Therefore, license plates with at least one number repeated:
  2. & Compute probability.
    • Desired outcomes over total outcomes: